{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BuhjNPTpju5n"
      },
      "source": [
        "# Gemini API: Embeddings Quickstart"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sUsgeyPu6ogK"
      },
      "source": [
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ddZb9-z46mM5"
      },
      "source": [
        "The Gemini API generates state-of-the-art text embeddings. An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph. You can use embeddings in many downstream applications like document search.\n",
        "\n",
        "This notebook provides quick code examples that show you how to get started generating embeddings."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "YD6urJjWGVDf"
      },
      "outputs": [],
      "source": [
        "!pip install -U -q google.generativeai # Install the Python SDK"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "yBapI259C99C"
      },
      "outputs": [],
      "source": [
        "import google.generativeai as genai"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DJriBaWmkL6Z"
      },
      "source": [
        "## Configure your API key\n",
        "\n",
        "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see  [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Zey3UiYGDDzU"
      },
      "outputs": [],
      "source": [
        "from google.colab import userdata\n",
        "GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')\n",
        "genai.configure(api_key=GOOGLE_API_KEY)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gGpQ8Eg0kNXW"
      },
      "source": [
        "## Embed content\n",
        "\n",
        "Call the `embed_content` method with the `models/embedding-001` model to generate text embeddings."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "J76TNa3QDwCc"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[0.04703258, -0.040190056, -0.029026963, -0.026809 ... TRIMMED]\n"
          ]
        }
      ],
      "source": [
        "text = \"Hello world\"\n",
        "result = genai.embed_content(model=\"models/embedding-001\", content=text)\n",
        "\n",
        "# Print just a part of the embedding to keep the output manageable\n",
        "print(str(result['embedding'])[:50], '... TRIMMED]')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rU6XX33547Ll"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "768\n"
          ]
        }
      ],
      "source": [
        "print(len(result['embedding'])) # The embeddings have 768 dimensions"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BUKqxF9yQuZl"
      },
      "source": [
        "## Batch embed content\n",
        "\n",
        "You can embed a list of multiple prompts with one API call for efficiency."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Hzz-7Heuf4tV"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[-0.0002620658, -0.05592018, -0.012463195, -0.0206 ... TRIMMED]\n",
            "[-0.0151748555, -0.050790474, -0.032357067, -0.058 ... TRIMMED]\n",
            "[0.025271073, -0.064161226, -0.025818137, -0.00611 ... TRIMMED]\n"
          ]
        }
      ],
      "source": [
        "result = genai.embed_content(\n",
        "    model=\"models/embedding-001\",\n",
        "    content=[\n",
        "      'What is the meaning of life?',\n",
        "      'How much wood would a woodchuck chuck?',\n",
        "      'How does the brain work?'])\n",
        "\n",
        "for embedding in result['embedding']:\n",
        "  print(str(embedding)[:50], '... TRIMMED]')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sSKcLGIpo8yc"
      },
      "source": [
        "## Use `task_type` to provide a hint to the model how you'll use the embeddings"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bz0zq1_shk98"
      },
      "source": [
        "Let's look at all the parameters the `embed_content` method takes. There are four:\n",
        "\n",
        "* `model`: Required. Must be `models/embedding-001`.\n",
        "* `content`: Required. The content that you would like to embed.\n",
        "*`task_type`: Optional. The task type for which the embeddings will be used. See below for possible values.\n",
        "* `title`: Optional. You should only set this parameter if your task type is `retrieval_document` (or `document`).\n",
        "\n",
        "`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application.\n",
        "\n",
        "The following task_type parameters are accepted:\n",
        "\n",
        "* `unspecified`: If you do not set the value, it will default to `retrieval_query`.\n",
        "* `retrieval_query` (or `query`): The given text is a query in a search/retrieval setting.\n",
        "* `retrieval_document` (or `document`): The given text is a document from a corpus being searched. Optionally, also set the `title` parameter with the title of the document.\n",
        "* `semantic_similarity` (or `similarity`): The given text will be used for  Semantic Textual Similarity (STS).\n",
        "* `classification`: The given text will be classified.\n",
        "* `clustering`: The embeddings will be used for clustering.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "LFjMapMV91es"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[0.04703258, -0.040190056, -0.029026963, -0.026809 ... TRIMMED]\n",
            "[0.05889487, -0.004501751, -0.067298084, -0.012740 ... TRIMMED]\n"
          ]
        }
      ],
      "source": [
        "# Notice the API returns different embeddings depending on `task_type`\n",
        "result1 = genai.embed_content(\n",
        "    model=\"models/embedding-001\",\n",
        "    content=\"Hello world\")\n",
        "\n",
        "result2 = genai.embed_content(\n",
        "    model=\"models/embedding-001\",\n",
        "    content=\"Hello world\",\n",
        "    task_type=\"document\",)\n",
        "\n",
        "print(str(result1['embedding'])[:50], '... TRIMMED]')\n",
        "print(str(result2['embedding'])[:50], '... TRIMMED]')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tpBm7GIdbkdK"
      },
      "source": [
        "## Learning more\n",
        "\n",
        "Check out these examples in the Cookbook to learn more about what you can do with embeddings:\n",
        "\n",
        "* [Search Reranking](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.\n",
        "\n",
        "* [Anomaly detection with embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.\n",
        "\n",
        "* [Train a text classifier](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.\n",
        "\n",
        "* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).\n",
        "\n",
        "You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)\n",
        "\n",
        "* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).\n",
        "\n",
        "* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents)."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "Embeddings.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
